83 research outputs found
Evaluating Gender Bias in Speech Translation
The scientific community is increasingly aware of the necessity to embrace
pluralism and consistently represent major and minor social groups. Currently,
there are no standard evaluation techniques for different types of biases.
Accordingly, there is an urgent need to provide evaluation sets and protocols
to measure existing biases in our automatic systems. Evaluating the biases
should be an essential step towards mitigating them in the systems.
This paper introduces WinoST, a new freely available challenge set for
evaluating gender bias in speech translation. WinoST is the speech version of
WinoMT which is a MT challenge set and both follow an evaluation protocol to
measure gender accuracy. Using a state-of-the-art end-to-end speech translation
system, we report the gender bias evaluation on four language pairs and we show
that gender accuracy in speech translation is more than 23% lower than in MT.Comment: Preprin
Sign Language Translation from Instructional Videos
The advances in automatic sign language translation (SLT) to spoken languages
have been mostly benchmarked with datasets of limited size and restricted
domains. Our work advances the state of the art by providing the first baseline
results on How2Sign, a large and broad dataset.
We train a Transformer over I3D video features, using the reduced BLEU as a
reference metric for validation, instead of the widely used BLEU score. We
report a result of 8.03 on the BLEU score, and publish the first open-source
implementation of its kind to promote further advances.Comment: Paper accepted at WiCV @CVPR2
On the Locality of Attention in Direct Speech Translation
Transformers have achieved state-of-the-art results across multiple NLP
tasks. However, the self-attention mechanism complexity scales quadratically
with the sequence length, creating an obstacle for tasks involving long
sequences, like in the speech domain. In this paper, we discuss the usefulness
of self-attention for Direct Speech Translation. First, we analyze the
layer-wise token contributions in the self-attention of the encoder, unveiling
local diagonal patterns. To prove that some attention weights are avoidable, we
propose to substitute the standard self-attention with a local efficient one,
setting the amount of context used based on the results of the analysis. With
this approach, our model matches the baseline performance, and improves the
efficiency by skipping the computation of those weights that standard attention
discards.Comment: ACL-SRW 2022. Equal contribution between Belen Alastruey and Javier
Ferrand
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
Speech translation models are unable to directly process long audios, like
TED talks, which have to be split into shorter segments. Speech translation
datasets provide manual segmentations of the audios, which are not available in
real-world scenarios, and existing segmentation methods usually significantly
reduce translation quality at inference time. To bridge the gap between the
manual segmentation of training and the automatic one at inference, we propose
Supervised Hybrid Audio Segmentation (SHAS), a method that can effectively
learn the optimal segmentation from any manually segmented speech corpus.
First, we train a classifier to identify the included frames in a segmentation,
using speech representations from a pre-trained wav2vec 2.0. The optimal
splitting points are then found by a probabilistic Divide-and-Conquer algorithm
that progressively splits at the frame of lowest probability until all segments
are below a pre-specified length. Experiments on MuST-C and mTEDx show that the
translation of the segments produced by our method approaches the quality of
the manual segmentation on 5 languages pairs. Namely, SHAS retains 95-98% of
the manual segmentation's BLEU score, compared to the 87-93% of the best
existing methods. Our method is additionally generalizable to different domains
and achieves high zero-shot performance in unseen languages.Comment: Submitted to Interspeech 2022, 5 pages. Previous version (v1) has
additionally a 2-page Appendi
Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23
This paper describes the submission of the UPC Machine Translation group to
the IWSLT 2023 Offline Speech Translation task. Our Speech Translation systems
utilize foundation models for speech (wav2vec 2.0) and text (mBART50). We
incorporate a Siamese pretraining step of the speech and text encoders with CTC
and Optimal Transport, to adapt the speech representations to the space of the
text model, thus maximizing transfer learning from MT. After this pretraining,
we fine-tune our system end-to-end on ST, with Cross Entropy and Knowledge
Distillation. Apart from the available ST corpora, we create synthetic data
with SegAugment to better adapt our models to the custom segmentations of the
IWSLT test sets. Our best single model obtains 31.2 BLEU points on MuST-C
tst-COMMON, 29.8 points on IWLST.tst2020 and 33.4 points on the newly released
IWSLT.ACLdev2023.Comment: IWSLT 202
Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer
In Neural Machine Translation (NMT), each token prediction is conditioned on
the source sentence and the target prefix (what has been previously translated
at a decoding step). However, previous work on interpretability in NMT has
mainly focused solely on source sentence tokens' attributions. Therefore, we
lack a full understanding of the influences of every input token (source
sentence and target prefix) in the model predictions. In this work, we propose
an interpretability method that tracks input tokens' attributions for both
contexts. Our method, which can be extended to any encoder-decoder
Transformer-based model, allows us to better comprehend the inner workings of
current NMT models. We apply the proposed method to both bilingual and
multilingual Transformers and present insights into their behaviour.Comment: EMNLP 202
Efficient Speech Translation with Dynamic Latent Perceivers
Transformers have been the dominant architecture for Speech Translation in
recent years, achieving significant improvements in translation quality. Since
speech signals are longer than their textual counterparts, and due to the
quadratic complexity of the Transformer, a down-sampling step is essential for
its adoption in Speech Translation. Instead, in this research, we propose to
ease the complexity by using a Perceiver encoder to map the speech inputs to a
fixed-length latent representation. Furthermore, we introduce a novel way of
training Perceivers, with Dynamic Latent Access (DLA), unlocking larger latent
spaces without any additional computational overhead. Speech-to-Text Perceivers
with DLA can match the performance of Transformer baselines across three
language pairs in MuST-C. Finally, a DLA-trained model is easily adaptable to
DLA at inference, and can be flexibly deployed with various computational
budgets, without significant drops in translation quality.Comment: ICASSP 202
Molecular diagnostics for Chagas disease: up to date and novel methodologies
Chagas disease is caused by the parasite Trypanosoma cruzi. It affects 7 million people, mainly in Latin America. Diagnosis is usually made serologically, but at some clinical scenarios serology cannot be used. Then, molecular detection is required for early detection of congenital transmission, treatment response follow up, and diagnosis of immune-suppression reactivation. However, present tests are technically demanding and require well-equipped laboratories which make them unfeasible in low-resources endemic regions
DNA-origami-aided lithography for sub-10 nanometer pattern printing
We report the first DNA-based origami technique that can print addressable patterns on surfaces with sub-10 nm resolution. Specifically, we have used a two-dimensional DNA origami as a template (DNA origami stamp) to transfer DNA with pre-programmed patterns (DNA ink) on gold surfaces. The DNA ink is composed of thiol-modified staple strands incorporated at specific positions of the DNA origami stamp to create patterns upon thiol-gold bond formation on the surface (DNA ink). The DNA pattern formed is composed of unique oligonucleotide sequences, each of which is individually addressable. As a proof-of-concept, we created a linear pattern of oligonucleotide-modified gold nanoparticles complementary to the DNA ink pattern. We have developed an in silico model to identify key elements in the formation of our DNA origami-driven lithography and nanoparticle patterning as well as simulate more complex nanoparticle patterns on surfaces
- …